Estimating individual treatment effect: generalization bounds and algorithms
نویسندگان
چکیده
There is intense interest in applying machine learning to problems of causal inference in healthcare, economics, education, and other fields. In particular, individuallevel causal inference has applications such as precision medicine and personalized advertising. We give a new theoretical analysis and family of algorithms for estimating individual treatment effect (ITE) from observational data. The algorithm itself learns a “balanced” representation such that the induced treated and control distributions look similar. We give a novel, simple and intuitive generalization error bound showing that the expected ITE estimation error of a representation is bounded by a sum of the standard generalization error of that representation and the distance between the treated and control distributions induced by the representation. We use Integral Probability Metrics to measure distances between distributions, deriving explicit bounds for the Wasserstein and Maximum Mean Discrepancy (MMD) distance. Experiments on real and simulated data show the new algorithms match or outperform state-of-the-art methods.
منابع مشابه
Supplemental Materials for: Estimating individual treatment effect: generalization bounds and algorithms
Equality (1) is because we assume that Yt and t are independent conditioned on x. Equality (2) follows from the consistency assumption. Finally, the last equation is composed entirely of observable quantities and can be estimated from data since we assume 0 < p(t = 1|x) < 1 for all x. Definition A2. Let p(x) := p(x|t = 1), and p(x) := p(x|t = 0) denote respectively the treatment and control dis...
متن کاملA generalization of Profile Hidden Markov Model (PHMM) using one-by-one dependency between sequences
The Profile Hidden Markov Model (PHMM) can be poor at capturing dependency between observations because of the statistical assumptions it makes. To overcome this limitation, the dependency between residues in a multiple sequence alignment (MSA) which is the representative of a PHMM can be combined with the PHMM. Based on the fact that sequences appearing in the final MSA are written based on th...
متن کاملGeneralization Bounds and Complexities Based on Sparsity and Clustering for Convex Combinations of Functions from Random Classes
A unified approach is taken for deriving new generalization data dependent bounds for several classes of algorithms explored in the existing literature by different approaches. This unified approach is based on an extension of Vapnik’s inequality for VC classes of sets to random classes of sets that is, classes depending on the random data, invariant under permutation of the data and possessing...
متن کاملEstimating Model Limitation in Financial Markets
We introduce bounds on the generalization ability when learning with noisy data. These results quantify the trade-oo between the amount of data and the noise level in the data. Our results can be used to derive a method for estimating the model limitation for a given learning problem. Changes in model imitation can then be used to detect a change in market volatility. Our results apply to linea...
متن کاملThe Coupon Subset Collection Problem
The coupon subset collection problem is a generalization of the classical coupon collecting problem, in that rather than collecting individual coupons we obtain, at each time point, a random subset of coupons. The problem of interest is to determine the expected number of subsets needed until each coupon is contained in at least one of these subsets. We provide bounds on this number, give effic...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017